Issues in the addition of ISO standard annotations to the Switchboard corpus

نویسندگان

  • Harry Bunt
  • Alex C. Fang
  • Xiaoyue Liu
  • Jing Cao
  • Volha Petukhova
چکیده

This paper analyzes the issues that arise when trying to add annotations to the dialogues in the Switchboard corpus according to ISO standard 24617-2, exploiting the existing SWBD-DAMSL annotations. These issues relate to differences between the two tag sets; to the highly multidimensional view that underlies the ISO standard; to differences in segmenting the dialogues into functional units; to the use of in-line markups for certain phenomena in Switchboard, and to the use of intra-dialogue dependence relations as defined in the ISO standard. The analysis is supplemented by a discussion of how the existing annotations may be helpful to semi-automatically create a fullyfledged ISO standard annotation alongside the existing SWBD-DAMSL annotation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Collaborative Annotation of Dialogue Acts: Application of a New ISO Standard to the Switchboard Corpus

This article reports some initial results from the collaborative work on converting SWBD-DAMSL annotation scheme used in the Switchboard Dialogue Act Corpus to ISO DA annotation framework, as part of our on-going research on the interoperability of standardized linguistic annotations. A qualitative assessment of the conversion between the two annotation schemes was performed to verify the appli...

متن کامل

Many Uses, Many Annotations for Large Speech Corpora: Switchboard and TDT as Case Studies

This paper discusses the challenges that arise when large speech corpora receive an ever-broadening range of diverse and distinct annotations. Two case studies of this process are presented: the Switchboard Corpus of telephone conversations and the TDT2 corpus of broadcast news. Switchboard has undergone two independent transcriptions and various types of additional annotation, all carried out ...

متن کامل

70 24 v 1 1 3 Ju l 2 00 0 Many Uses , Many Annotations for Large Speech Corpora : Switchboard and TDT as Case Studies

This paper discusses the challenges that arise when large speech corpora receive an ever-broadening range of diverse and distinct annotations. Two case studies of this process are presented: the Switchboard Corpus of telephone conversations and the TDT2 corpus of broadcast news. Switchboard has undergone two independent transcriptions and various types of additional annotation, all carried out ...

متن کامل

The NXT-format Switchboard Corpus: a rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue

This paper describes a recently completed common resource for the study of spoken discourse, the NXT-format Switchboard Corpus. Switchboard is a long-standing corpus of telephone conversations (Godfrey et al., 1992). We have brought together transcriptions with existing annotations for syntax, disfluency, speech acts, animacy, information status, coreference, and prosody; along with substantial...

متن کامل

The DialogBank

This paper presents the DialogBank, a new language resource consisting of dialogues with gold standard annotations according to the ISO 24617-2 standard. Some of these dialogues have been taken from existing corpora and have been re-annotated according to the ISO standard; others have been annotated directly according to the standard. The ISO 24617-2 annotations have been designed according to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013